AITopics | recognizing object

Deep learning is closing the gap with humans on several object recognition benchmarks. Here we investigate this gap in the context of challenging images where objects are seen from unusual viewpoints. We find that humans excel at recognizing objects in unusual poses, in contrast with state-of-the-art pretrained networks (EfficientNet, SWAG, ViT, SWIN, BEiT, ConvNext) which are systematically brittle in this condition. Remarkably, as we limit image exposure time, human performance degrades to the level of deep networks, suggesting that additional mental processes (requiring additional time) take place when humans identify objects in unusual poses. Finally, our analysis of error patterns of humans vs. networks reveals that even time-limited humans are dissimilar to feed-forward deep networks. We conclude that more work is needed to bring computer vision systems to the level of robustness of the human visual system. Understanding the nature of the mental processes taking place during extra viewing time may be key to attain such robustness.

human beat deep network, recognizing object, unusual pose, (14 more...)

arXiv.org Artificial Intelligence

2402.03973

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Newfoundland and Labrador > Labrador (0.04)
Africa > Senegal > Thiès Region > Thiès (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

TRAFFIC: Recognizing Objects Using Hierarchical Reference Frame Transformations

Neural Information Processing SystemsApr-6-2023, 19:52:55 GMT

We describe a model that can recognize two-dimensional shapes in an unsegmented image, independent of their orientation, position, and scale. The model, called TRAFFIC, efficiently represents the structural relation between an object and each of its component features by encoding the fixed viewpoint-invariant transformation from the feature's reference frame to the object's in the weights of a connectionist network. Using a hierarchy of such transformations, with increasing complexity of features at each successive layer, the network can recognize multiple objects in parallel. An implemen(cid:173) tation of TRAFFIC is described, along with experimental results demonstrating the network's ability to recognize constellations of stars in a viewpoint-invariant manner.

hierarchical reference frame transformation, recognizing object, traffic

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

TRAFFIC: Recognizing Objects Using Hierarchical Reference Frame Transformations

Zemel, Richard S., Mozer, Michael C., Hinton, Geoffrey E.

Neural Information Processing SystemsDec-31-1990

We describe a model that can recognize two-dimensional shapes in an unsegmented image, independent of their orientation, position, and scale. The model, called TRAFFIC, efficiently represents the structural relation between an object and each of its component features by encoding the fixed viewpoint-invariant transformation from the feature's reference frame to the object's in the weights of a connectionist network. Using a hierarchy of such transformations, with increasing complexity of features at each successive layer, the network can recognize multiple objects in parallel. An implementation of TRAFFIC is described, along with experimental results demonstrating the network's ability to recognize constellations of stars in a viewpoint-invariant manner. 1 INTRODUCTION A key goal of machine vision is to recognize familiar objects in an unsegmented image, independent of their orientation, position, and scale. Massively parallel models have long been used for lower-level vision tasks, such as primitive feature extraction and stereo depth. Models addressing "higher-level" vision have generally been restricted to pattern matching types of problems, in which much of the inherent complexity of the domain has been eliminated or ignored.

reference frame, traffic, transformation, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.30)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.05)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)

Add feedback

TRAFFIC: Recognizing Objects Using Hierarchical Reference Frame Transformations

Zemel, Richard S., Mozer, Michael C., Hinton, Geoffrey E.

Neural Information Processing SystemsDec-31-1990

We describe a model that can recognize two-dimensional shapes in an unsegmented image, independent of their orientation, position, and scale. The model, called TRAFFIC, efficiently represents the structural relation between an object and each of its component features by encoding the fixed viewpoint-invariant transformation from the feature's reference frame to the object's in the weights of a connectionist network. Using a hierarchy of such transformations, with increasing complexity of features at each successive layer, the network can recognize multiple objects in parallel. An implementation of TRAFFIC is described, along with experimental results demonstrating the network's ability to recognize constellations of stars in a viewpoint-invariant manner. 1 INTRODUCTION A key goal of machine vision is to recognize familiar objects in an unsegmented image, independent of their orientation, position, and scale. Massively parallel models have long been used for lower-level vision tasks, such as primitive feature extraction and stereo depth. Models addressing "higher-level" vision have generally been restricted to pattern matching types of problems, in which much of the inherent complexity of the domain has been eliminated or ignored.

reference frame, traffic, transformation, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.30)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.05)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)

Add feedback

TRAFFIC: Recognizing Objects Using Hierarchical Reference Frame Transformations

Zemel, Richard S., Mozer, Michael C., Hinton, Geoffrey E.

Neural Information Processing SystemsDec-31-1990

We describe a model that can recognize two-dimensional shapes in an unsegmented image, independent of their orientation, position, and scale. The model, called TRAFFIC, efficiently represents the structural relation between an object and each of its component features by encoding the fixed viewpoint-invariant transformation from the feature's reference frame to the object's in the weights of a connectionist network. Using a hierarchy of such transformations, with increasing complexity of features at each successive layer, the network can recognize multiple objects in parallel. An implementation ofTRAFFIC is described, along with experimental results demonstrating the network's ability to recognize constellations of stars in a viewpoint-invariant manner. 1 INTRODUCTION A key goal of machine vision is to recognize familiar objects in an unsegmented image, independent of their orientation, position, and scale. Massively parallel models have long been used for lower-level vision tasks, such as primitive feature extraction and stereo depth.

artificial intelligence, machine learning, reference frame, (19 more...)

Neural Information Processing Systems

Country: